Extract Me If You Can: Abusing PDF Parsers in Malware Detectors

نویسندگان

Curtis Carmony

Xunchao Hu

Heng Yin

Abhishek Vasisht Bhaskar

Mu Zhang

چکیده

Owing to the popularity of the PDF format and the continued exploitation of Adobe Reader, the detection of malicious PDFs remains a concern. All existing detection techniques rely on the PDF parser to a certain extent, while the complexity of the PDF format leaves an abundant space for parser confusion. To quantify the difference between these parsers and Adobe Reader, we create a reference JavaScript extractor by directly tapping into Adobe Reader at locations identified through a mostly automatic binary analysis technique. By comparing the output of this reference extractor against that of several opensource JavaScript extractors on a large data set obtained from VirusTotal, we are able to identify hundreds of samples which existing extractors fail to extract JavaScript from. By analyzing these samples we are able to identify several weaknesses in each of these extractors. Based on these lessons, we apply several obfuscations on a malicious PDF sample, which can successfully evade all the malware detectors tested. We call this evasion technique a PDF parser confusion attack. Lastly, we demonstrate that the reference JavaScript extractor improves the accuracy of existing JavaScript-based classifiers and how it can be used to mitigate these parser limitations in a real-world setting.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Anomaly-Based Malware Detection Using Hardware Features

Recent works have shown promise in using microarchitectural execution patterns to detect malware programs. These detectors belong to a class of detectors known as signaturebased detectors as they catch malware by comparing a program’s execution pattern (signature) to execution patterns of known malware programs. In this work, we propose a new class of detectors — anomaly-based hardware malware ...

متن کامل

Hardening Classifiers against Evasion: the Good, the Bad, and the Ugly

Machine learning is widely used in security applications, particularly in the form of statistical classification aimed at distinguishing benign from malicious entities. Recent research has shown that such classifiers are often vulnerable to evasion attacks, whereby adversaries change behavior to be categorized as benign while preserving malicious functionality. Research into evasion attacks has...

متن کامل

Advanced Persistent Threat: Malicious Code Hidden in PDF Documents

Advanced Persistent Threat (APT) in recent years has become a very popular choice to steal information of specific targets using the vulnerabilities on the targets’ machine. APT involves a set of complex phases, which are difficult to detect and often initiated with spear phishing in the early stage of invasion. To help defend against APT, it is important to study the malformed Portable Documen...

متن کامل

Correcting Proofs via PDF Commenting

The “paperless office” often works better in theory than the real world, but it is becoming feasible to mark text corrections electronically. The free Adobe Reader provides a convenient means for doing so via “PDF Commenting” (also known as “Acrobat Commenting”) once this capability has been enabled for a given PDF. The basic commenting process is quite simple. You open the enabled PDF with Ado...

متن کامل

Malware Normalization

Malware is code designed for a malicious purpose, such as obtaining root privilege on a host. A malware detector identifies malware and thus prevents it from adversely affecting a host. In order to evade detection by malware detectors, malware writers use various obfuscation techniques to transform their malware. There is strong evidence that commercial malware detectors are susceptible to thes...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Extract Me If You Can: Abusing PDF Parsers in Malware Detectors

نویسندگان

چکیده

منابع مشابه

Unsupervised Anomaly-Based Malware Detection Using Hardware Features

Hardening Classifiers against Evasion: the Good, the Bad, and the Ugly

Advanced Persistent Threat: Malicious Code Hidden in PDF Documents

Correcting Proofs via PDF Commenting

Malware Normalization

عنوان ژورنال:

اشتراک گذاری